Add GAX client_config to service factory #848

blowmage · 2016-08-24T20:59:44Z

Language is the first service using GAX. GAX implements retries and incremental backoff, and our original plan was to make it conform to the retries value used by other services. However, GAX is different enough that it probably isn't feasible to share the same configuration approach as Google API Client. Therefore, here is an option for allowing users to configure the GAX client while continuing to use Google Cloud. See also #849 for overriding the GAX configuration on API calls. Either approach is independent of the other.

This change replaces retries with client_config; it effectively leaks the GAX implementation to the Google Cloud Language public API. This means that if GAX ever changes its implementation Google Cloud Language may need to also make a breaking change to support the new GAX API. So far we have resisted leaking these types of details, but there is no better way to allow the GAX client to be fully configurable other than leaking these details.

What do you think of this type of change? Is this a direction you want to see the other GRPC/GAX services move towards?

The client_config hash can contain any part of the config JSON that is intended to be overridden. Here is what this may look like:

my_config = {"interfaces"=>
              {"google.cloud.language.v1beta1.LanguageService"=>
                {"retry_codes"=>
                  {"retry_codes_def"=>
                    {"idempotent"=>["DEADLINE_EXCEEDED", "UNAVAILABLE"],
                     "non_idempotent"=>[]}},
                 "retry_params"=>
                  {"default"=>
                    {"initial_retry_delay_millis"=>100,
                     "retry_delay_multiplier"=>1.3,
                     "max_retry_delay_millis"=>60000,
                     "initial_rpc_timeout_millis"=>60000,
                     "rpc_timeout_multiplier"=>1.0,
                     "max_rpc_timeout_millis"=>60000,
                     "total_timeout_millis"=>600000}}}}}

require "google/cloud"

gcloud = Google::Cloud.new
lang = gcloud.language client_config: my_config

blowmage · 2016-08-24T21:18:22Z

@timanovsky This isn't Datastore, but this is the approach we are considering for configuring GRPC/GAX clients. Can you look at this and let us know what you think of this? Is the ability to configure the client at this level worth the risk of future breaking changes due to how leaky this approach is?

blowmage · 2016-08-24T21:20:08Z

@jmuk Can you weigh in on this? How comfortable are you with gcloud-ruby leaking the GAX configuration this way? How likely are we to see breaking changes to the configuration?

blowmage · 2016-08-24T21:40:11Z

@omaray @jgeewax Thoughts on this?

jmuk · 2016-08-24T22:05:04Z

In my feeling, the format is almost stable. It's unlikely to change, and I'd say it's safe to expose to the users.
I chatted with @jgeiger, @bjwatson (who are responsible for Python clients and share the same json config data), and @garrettjonesgoogle (similarly for PHP), and got similar thoughts.

blowmage · 2016-08-24T22:09:35Z

@jmuk Are the code examples in the PR description accurate?

jmuk · 2016-08-24T22:25:40Z

Yes, the format in the commit message is correct.

blowmage · 2016-08-24T22:26:51Z

@bmclean Any input on this?

coveralls · 2016-08-25T00:23:40Z

Coverage increased (+0.001%) to 95.656% when pulling 378c09e00c694589025ff1aebc32e481b23158d0 on blowmage:language-client_config into 956fbba on GoogleCloudPlatform:master.

bmclean · 2016-08-25T03:39:40Z

@blowmage I like the approach of defining which error codes get retried. We currently automatically retry at the application level for UNAVAILABLE and INTERNAL but not for DEADLINE_EXCEEDED. With this approach we could move that retry down into the gcloud datastore config.

I don't know if we would make use of the fine-grained control proposed in #849. For datastore calls critical to our business I think we would want to keep the retry at the application level.

blowmage · 2016-08-25T18:45:25Z

@bmclean: Thanks for the input!

blowmage · 2016-09-12T14:47:27Z

@timanovsky Can you look this over and give us your thoughts on exposing this level of configuration? This would replace the retries option.

timanovsky · 2016-09-12T15:06:19Z

I don't see how I specify number of retries. I see time deadline, but not number of attempts desired.

Also, can you provide sensible default list of exceptions to retry on? My understanding is that it is

[Google::Cloud::AbortedError,
Google::Cloud::DeadlineExceededError,
Google::Cloud::InternalError,
Google::Cloud::ResourceExhaustedError,
Google::Cloud::UnavailableError,
Google::Cloud::UnknownError]

According to https://cloud.google.com/datastore/docs/concepts/errors#error_codes

(not sure about the last one)

blowmage · 2016-09-12T15:18:36Z

Also, can you provide sensible default list of exceptions to retry on?

Perhaps @jmuk can give us some more explanation of the configuration and what the error strings are that can be set for retry.

I don't see how I specify number of retries.

Right, that's the change with this new configuration. There is no one value for number of retries. The number of retries would have to be calculated from the following parameters:

"retry_params" => {
  "default" => {
    "initial_retry_delay_millis" => 100,
    "retry_delay_multiplier" => 1.3,
    "max_retry_delay_millis" => 60000,
    "initial_rpc_timeout_millis" => 60000,
    "rpc_timeout_multiplier" => 1.0,
    "max_rpc_timeout_millis" => 60000,
    "total_timeout_millis" => 600000
  }
}

blowmage · 2016-09-12T15:21:48Z

@timanovsky For reference, here is the entire GAX JSON configuration for Datastore, since you are likely most familiar with it:

{
  "interfaces": {
    "google.datastore.v1.Datastore": {
      "retry_codes": {
        "retry_codes_def": {
          "idempotent": [
            "DEADLINE_EXCEEDED",
            "UNAVAILABLE"
          ],
          "non_idempotent": []
        }
      },
      "retry_params": {
        "default": {
          "initial_retry_delay_millis": 100,
          "retry_delay_multiplier": 1.3,
          "max_retry_delay_millis": 60000,
          "initial_rpc_timeout_millis": 60000,
          "rpc_timeout_multiplier": 1.0,
          "max_rpc_timeout_millis": 60000,
          "total_timeout_millis": 600000
        }
      },
      "methods": {
        "Lookup": {
          "timeout_millis": 60000,
          "retry_codes_name": "idempotent",
          "retry_params_name": "default"
        },
        "RunQuery": {
          "timeout_millis": 60000,
          "retry_codes_name": "idempotent",
          "retry_params_name": "default"
        },
        "BeginTransaction": {
          "timeout_millis": 60000,
          "retry_codes_name": "non_idempotent",
          "retry_params_name": "default"
        },
        "Commit": {
          "timeout_millis": 60000,
          "retry_codes_name": "non_idempotent",
          "retry_params_name": "default"
        },
        "Rollback": {
          "timeout_millis": 60000,
          "retry_codes_name": "non_idempotent",
          "retry_params_name": "default"
        },
        "AllocateIds": {
          "timeout_millis": 60000,
          "retry_codes_name": "non_idempotent",
          "retry_params_name": "default"
        }
      }
    }
  }
}

blowmage · 2016-09-12T16:57:07Z

@timanovsky The way I read the configuration, the time between retries will increment until the max is reached, and then it will keep retrying at that max interval. But there doesn't seem to be any mechanism for total number of retries. @jmuk can you confirm?

quartzmo · 2016-09-12T17:00:47Z

I'm not certain about this, but max_retry_delay_millis seems like it could be the total limit for all retries, and after it is exceeded, an error will occur?

timanovsky · 2016-09-12T17:12:48Z

This quite uncommon approach. If it will remain so, I guess I will need to keep using application level retry logic.

quartzmo · 2016-09-12T17:29:09Z

@jmuk Regarding @timanovsky's comment above, is there any flexibility in the design of the GAX retry behavior?

jmuk · 2016-09-12T17:30:31Z

The interval between retries are increased exponentially (by retry_delay_multiplier), but capped by max_retry_delay_millis. That's the "exponential backoff" and only DEADLINE_EXCEEDED and UNAVAILABLE are specified as "Retry using exponential backoff" in that page.

Other errors are normally not retried automatically, developers will have to look into the details to fix the problem (and so they are recommended as "Do not retry without fixing the problem").

ABORTED can be added for some methods too, although I'm not sure it's actually used for non-transactional APIs.

jmuk · 2016-09-12T17:32:12Z

The config has 'retry_codes_def' section where which specified retryable errors, but the retrying number is same for a method, per-failure retry logic is not implemented (and that's complicated when it's combined with exponential backoff).

blowmage · 2016-09-12T18:07:56Z

Thanks @jmuk. Is there a better list of all the GRPC Error Codes than this?

https://github.com/grpc/grpc/blob/master/doc/statuscodes.md

bmclean · 2016-09-12T20:14:30Z

How does initial_rpc_timeout_millis work? The description is the initial timeout parameter to the request. If I set it to 5000, and my app makes a request, what happens after the 5 seconds if no response has been received? Does it timeout and move on to the retry logic? I am also assuming that if an error was thrown before the timeout expired (such as UNAVAILABLE) it would also move on to the retry logic?

blowmage · 2016-09-12T20:22:43Z

@jmuk Any input?

google-cloud-language/lib/google-cloud-language.rb

@@ -110,7 +110,7 @@ def self.language project = nil, keyfile = nil, scope: nil, retries: nil,
      end
      Google::Cloud::Language::Project.new(
        Google::Cloud::Language::Service.new(
-          project, credentials, retries: retries, timeout: timeout))
+          project, credentials, timeout: timeout, client_config: client_config))


jmuk · 2016-09-13T00:03:51Z

Sorry for the delay. Yes, that link will be the reference for gRPC errors, but in the context of Google API clients, https://cloud.google.com/datastore/docs/concepts/errors#error_codes would be the better reference.

initial_rpc_timeout will be used for the timeout parameters of invoking a method call and behaves as exponential backoff with increase rate of rpc_timeout_multiplier capped by max_rpc_timeout. See: http://www.rubydoc.info/gems/google-gax/Google/Gax/BackoffSettings

If it's set as 5 seconds and the server does not return in that time, DEADLINE_EXCEEDED should be raised, and the invocation will be retried. The retried timeout will be increased by rpc_timeout_multiplier after waiting for the retry_delay. If it's still timed out, timeout will be increased even more, after increased retry_delay, and so on.

bmclean · 2016-09-13T01:58:11Z

Thanks @jmuk, that is perfect. I still think this approach has more value than using a fixed retry parameter. It gives you exponential back-off on the desired error codes (I am thinking of UNAVAILABLE) with a max overall timeout (so it won't retry forever) but also allows a force retry after a specific time.

"initial_rpc_timeout_millis": 5000
(the initial timeout parameter to the request)
Times out if a response is not initially received within 5 sec.

"initial_retry_delay_millis": 500
(the initial delay time, in milliseconds, between the completion of the first failed request and the initiation of the first retrying request)
Waits 0.5s and automatically retries the request.

"rpc_timeout_multiplier": 1.0
(the multiplier by which to increase the timeout parameter between failed requests)
Would wait 5 seconds after the retry before timing out.

"retry_delay_multiplier": 2.0
(the multiplier by which to increase the delay time between the completion of failed requests, and the initiation of the subsequent retrying request)
The request will keep being retried with an exponential back-off (0.50, 1, 2) waiting 5 seconds for each request until it succeeds or finally:

"total_timeout_millis": 17000
(the total time, in milliseconds, starting from when the initial request is sent, after which an error will be returned, regardless of the retrying attempts made meanwhile)

@blowmage I am now thinking that we would also have use cases for the per API request approach outlined in #849.

Replace retries value with the GAX client configuration. This leaks the GAX implementation to the google-cloud public API. If Gax changes its implementation this may result in a change to the google-cloud public API.

quartzmo · 2016-09-29T17:16:00Z

After considering the options, I agree that this is probably the best way forward. I am going to merge this PR, and I think we should use it as that basis for configuring retries in other services that we convert to GAX.

GAX does not directly support integer. [refs googleapis#848]

blowmage added feedback wanted api: language Issues related to the Cloud Natural Language API API. labels Aug 24, 2016

googlebot added the cla: yes This human has signed the Contributor License Agreement. label Aug 24, 2016

blowmage mentioned this pull request Aug 24, 2016

Add GAX call_options to methods making API calls #849

Closed

blowmage mentioned this pull request Aug 24, 2016

Implement Natural Language Service #828

Closed

6 tasks

blowmage reviewed Sep 12, 2016
View reviewed changes

quartzmo mentioned this pull request Sep 16, 2016

Convert Datastore to GAPIC #903

Merged

Add GAX client_config to service factory

d448dbf

Replace retries value with the GAX client configuration. This leaks the GAX implementation to the google-cloud public API. If Gax changes its implementation this may result in a change to the google-cloud public API.

blowmage force-pushed the language-client_config branch from 378c09e to d448dbf Compare September 29, 2016 17:11

quartzmo merged commit 848b608 into googleapis:master Sep 29, 2016

quartzmo added a commit to quartzmo/google-cloud-ruby that referenced this pull request Oct 7, 2016

Update Datastore retries documentation

ed0c9c1

GAX does not directly support integer. [refs googleapis#848]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GAX client_config to service factory #848

Add GAX client_config to service factory #848

blowmage commented Aug 24, 2016 •

edited

Loading

blowmage commented Aug 24, 2016

blowmage commented Aug 24, 2016

blowmage commented Aug 24, 2016

jmuk commented Aug 24, 2016

blowmage commented Aug 24, 2016

jmuk commented Aug 24, 2016

blowmage commented Aug 24, 2016

coveralls commented Aug 25, 2016 •

edited

Loading

bmclean commented Aug 25, 2016

blowmage commented Aug 25, 2016

blowmage commented Sep 12, 2016

timanovsky commented Sep 12, 2016

blowmage commented Sep 12, 2016

blowmage commented Sep 12, 2016

blowmage commented Sep 12, 2016

quartzmo commented Sep 12, 2016

timanovsky commented Sep 12, 2016

quartzmo commented Sep 12, 2016

jmuk commented Sep 12, 2016

jmuk commented Sep 12, 2016 •

edited

Loading

blowmage commented Sep 12, 2016

bmclean commented Sep 12, 2016 •

edited

Loading

blowmage commented Sep 12, 2016

This comment was marked as spam.

jmuk commented Sep 13, 2016

bmclean commented Sep 13, 2016

quartzmo commented Sep 29, 2016

Add GAX client_config to service factory #848

Add GAX client_config to service factory #848

Conversation

blowmage commented Aug 24, 2016 • edited Loading

blowmage commented Aug 24, 2016

blowmage commented Aug 24, 2016

blowmage commented Aug 24, 2016

jmuk commented Aug 24, 2016

blowmage commented Aug 24, 2016

jmuk commented Aug 24, 2016

blowmage commented Aug 24, 2016

coveralls commented Aug 25, 2016 • edited Loading

bmclean commented Aug 25, 2016

blowmage commented Aug 25, 2016

blowmage commented Sep 12, 2016

timanovsky commented Sep 12, 2016

blowmage commented Sep 12, 2016

blowmage commented Sep 12, 2016

blowmage commented Sep 12, 2016

quartzmo commented Sep 12, 2016

timanovsky commented Sep 12, 2016

quartzmo commented Sep 12, 2016

jmuk commented Sep 12, 2016

jmuk commented Sep 12, 2016 • edited Loading

blowmage commented Sep 12, 2016

bmclean commented Sep 12, 2016 • edited Loading

blowmage commented Sep 12, 2016

This comment was marked as spam.

jmuk commented Sep 13, 2016

bmclean commented Sep 13, 2016

quartzmo commented Sep 29, 2016

blowmage commented Aug 24, 2016 •

edited

Loading

coveralls commented Aug 25, 2016 •

edited

Loading

jmuk commented Sep 12, 2016 •

edited

Loading

bmclean commented Sep 12, 2016 •

edited

Loading